What is claimed is: 

1 . A method for processing a memory instruction, the method comprising: 
obtaining a memory instruction; 

obtaining one or more memory address operands; 
5 creating a virtual memory address using the one or more memory address 

operands; 

translating the virtual memory address into a physical memory address; and 
executing the memory instruction using a cache controller, 
wherein the cache controller uses the memory instruction and the physical 
10 memory address to determine whether to access a portion of a local or a remote 
cache. 

2. The method of claim 1, wherein the obtaining of the memory instruction 
includes obtaining a scalar memory instruction. 

15 

3. The method of claim 1, wherein the obtaining of the memory instruction 
includes obtaining a memory instruction selected from a group consisting of a scalar 
load instruction, a scalar store instruction, a prefetch instruction, a synchronization 
instruction, and an atomic memory operation (AMO) instruction. 

20 

4. The method of claim 1 , wherein the obtaining of the memory instruction 
includes obtaining a scalar store instruction, and wherein the method further 
comprises obtaining scalar store data. 

25 5. The method of claim 1 , wherein the translating includes translating the 
virtual memory address into a physical memory address using a translation look- 
aside buffer (TLB). 
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6. The method of claim 1 , wherein the method further comprises committing 
the memory instruction before execution. 

7. A computerized method, comprising: 
5 obtaining a memory request; 

storing the memory request in a first memory request container; and 
processing the memory request from the first memory request container by a 
cache controller, wherein the processing includes 

identifying a type of the memory request, 
1 o processing the memory request in a local cache as a function of a first 

condition, and 

processing the memory request using a second memory request 
container as a function of a second condition, 

wherein the processing of the memory request using the second 
1 5 memory request container includes updating a portion of the local cache with a 
portion of a remote cache. 

8. The computerized method of claim 7, wherein the obtaining of the memory 
request includes obtaining a memory load or a memory store request. 

20 

9. The computerized method of claim 7, wherein the storing of the memory 
request in the first memory request container includes storing the memory request in 
an Initial Request Queue (IRQ). 

25 10. The computerized method of claim 7, wherein the processing of the memory 
request in the local cache as a function of the first condition includes processing the 
memory request in the local cache when the local cache includes an entry associated 
with the memory request. 
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1 1 . The computerized method of claim 7, wherein the processing of the memory 
request using a second memory request container as a function of a second condition 
includes processing the memory request using a Forced Order Queue (FOQ) as a 
function of the second condition. 

5 

12. The computerized method of claim 11, wherein the processing of the 
memory request using the FOQ as a function of the second condition includes 
processing the memory request using the FOQ when the memory request matches a 
corresponding request in the FOQ. 

10 

1 3 . The computerized method of claim 1 1 , wherein the processing of the 
memory request using the FOQ as a function of the second condition includes 
adding the memory request to the FOQ when the memory request is not initially 
processed by the local cache, and when the memory request does not initially match 

1 5 a corresponding request already in the FOQ. 

14. The computerized method of claim 1 1 , wherein the processing of the 
memory request using the FOQ as a function of the second condition includes 
processing the memory request in the FOQ when local cache processing is bypassed. 

20 

1 5 . The computerized method of claim 1 1 , wherein the processing of the 
memory request using the FOQ as a function of the second condition includes 
processing the memory request in the FOQ when the memory request includes a 
synchronization request that causes local cache processing to be bypassed. 

25 

16. A computerized method, comprising: 

adding a set of memory instructions of a first type to a memory instruction 
container; 
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executing the set of memory instructions of the first type from the memory 
instruction container; 

receiving a memory instruction of a second type; 

receiving a synchronization instruction that temporarily blocks further 
5 execution of memory instructions of the first type from the memory instruction 
container; 

adding an additional set of memory instructions of the first type to the 
memory instruction container; 

executing the memory instruction of the second type; and 
10 upon such execution of the memory instruction of the second type, executing 

the additional set of memory instructions of the first type from the memory 
instruction container. 

17. The computerized method of claim 16, wherein the adding of the set of 

1 5 memory instructions of the first type to the memory instruction container includes 
adding the set of memory instructions of the first type to a memory instruction 
queue. 

1 8 . The computerized method of claim 1 6, wherein the adding of the set of 
20 memory instructions of the first type to the memory instruction container includes 

adding a set of scalar memory instructions to the memory instruction container. 

19. The computerized method of claim 16, wherein the executing of the set of 
memory instructions of the first type from the memory instruction container includes 

25 accessing a local or remote cache unit. 

20. The computerized method of claim 16, wherein the receiving of the memory 
instruction of the second type includes receiving a vector memory instruction. 
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21 . The computerized method of claim 16, wherein the executing of the memory 
instruction of the second type includes accessing a remote cache unit, and updating a 
local cache unit with contents from the remote cache unit. 

5 22. A scalar processor, comprising: 
a cache; and 

a scalar load/store unit having a first container, a second container, and a 
cache controller, 

wherein the cache controller obtains a scalar load/store command from the 
10 first container, the scalar load/store command having a scalar load/store instruction 
and one or more addresses, 

wherein the cache services the scalar load/store command when the scalar 
load/store command hits in the cache, 

wherein the scalar load/store command is added to the second container 
1 5 when the scalar load/store command misses in the cache, and 

wherein one or more lines in the cache are allocated by a remote cache when 
the scalar load/store command is added to the second container. 

23. The scalar processor of claim 22, wherein the first container comprises an 
20 Initial Request Queue (IRQ), and wherein the second container comprises a Forced 

Order Queue (FOQ). 

24. The scalar processor of claim 22, wherein the scalar load/store unit further 
includes an address generator to generate one or more physical addresses from the 

25 one or more addresses of the scalar load/store command. 

25. The scalar processor of claim 24, wherein the address generator generates the 
one or more physical addresses using a translation look-aside buffer (TLB). 
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26. A processing unit, comprising: 
a remote cache interface; and 

a scalar processor having a local cache, 

wherein the scalar processor dispatches a memory instruction, obtains one or 
5 more address operands, creates a virtual address from the one or more address 
operands, translates the virtual address into a physical address using a translation 
look-aside buffer (TLB), and executes the memory instruction using a cache 
controller, the cache controller determining whether to obtain data from the local 
cache or to allocate one or more lines of the local cache through the remote cache 
10 interface. 

27. A processing unit, comprising: 
a cache interface; 

a vector processor; and 
15 a scalar processor, 

wherein the scalar processor processes a plurality of scalar memory 
instructions in a memory instruction container, 

wherein the vector processor receives a vector memory instruction, 
wherein the scalar processor receives a synchronization instruction and 
20 temporarily blocks further processing of scalar memory instructions in the memory 
instruction container, 

wherein the vector processor processes the vector memory instruction by 
accessing a cache through the cache interface, and 

wherein the scalar processor continues to process further scalar memory 
25 instructions in the memory instruction container after the vector processor has 
processed the vector memory instruction. 

28. The processing unit of claim 27, wherein the memory instruction container 
comprises a memory instruction queue. 
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